WIP: [NOT READY FOR REVIEW] build wheels with CUDA 13.0.x, test wheels against mix of CTK versions#2270
WIP: [NOT READY FOR REVIEW] build wheels with CUDA 13.0.x, test wheels against mix of CTK versions#2270jameslamb wants to merge 11 commits intorapidsai:mainfrom
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ci/test_wheel_integrations.sh`:
- Around line 7-13: Replace the floating branch plus fixed /tmp clone path in
the git clone step: instead of cloning the mutable branch
"generate-pip-constraints" into "/tmp/gha-tools", pin the clone to an immutable
ref (a commit SHA or tag) and clone into a unique temporary directory (e.g.,
created with mktemp -d) so CI runs are reproducible; update the subsequent PATH
export that references /tmp/gha-tools to point to the new temp-dir variable and
ensure the script handles an existing directory by removing or updating it
before cloning.
In `@dependencies.yaml`:
- Around line 457-480: The CUDA 12.2 matrix pins torch==2.4.0+cu124 and uses
--extra-index-url=https://download.pytorch.org/whl/cu124 but those wheels do not
exist for Python 3.13/3.14; update the CUDA "12.2" matrix entry by either (a)
bumping the pinned package from torch==2.4.0+cu124 to a newer torch release that
provides cp313/cp314 wheels, or (b) remove the cu124-specific index and version
pin (the --extra-index-url entry and torch==2.4.0+cu124) so the matrix falls
back to a supported wheel (or match the same torch/version used in the other
cuda matrices like torch==2.9.0+cu129), ensuring the matrix with cuda: "12.2" no
longer references torch==2.4.0+cu124 or the cu124 index.
- Around line 279-282: The YAML matrix entry that sets use_cuda_wheels: "false"
currently has a bare `packages:` (null); change that `packages:` to an explicit
empty list `packages: []` for the matrix block where `use_cuda_wheels: "false"`
so the `packages` key is a list (not null) and avoids type validation issues.
|
/ok to test |
| -d "${TORCH_WHEEL_DIR}" \ | ||
| --constraint "${PIP_CONSTRAINT}" \ | ||
| --constraint ./torch-constraints.txt \ | ||
| 'torch' |
There was a problem hiding this comment.
Just picking a place on the diff to have a threaded conversation.
Here's an interesting one... on the Python 3.14 + CUDA 13.1.1 + latest dependencies jobs (arm64 and amd64), the solve is falling back to numba-cuda==24.0, which has an sdist but no Python 3.14 wheels, which is leading to it being built from source and that build failing!
Downloading http://pip-cache.local.gha-runners.nvidia.com/packages/04/51/8935ff9ae5150e1ffed945bf1b95002a6a5e1f9256aeb1143e1c159b68c5/numba_cuda-0.24.0.tar.gz (1.3 MB)
...
Installing build dependencies: started
Running command installing build dependencies for numba-cuda
...
Building wheels for collected packages: numba-cuda
...
g++ -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 -Wall -fPIC -I/tmp/pip-build-env-y5e9zevx/overlay/lib/python3.14/site-packages/numpy/_core/include -Inumba_cuda/numba/cuda/cext -I/pyenv/versions/3.14.3/include/python3.14 -c numba_cuda/numba/cuda/cext/_dispatcher.cpp -o build/temp.linux-x86_64-cpython-314/numba_cuda/numba/cuda/cext/_dispatcher.o -std=c++11
numba_cuda/numba/cuda/cext/_dispatcher.cpp:1018:2: error: #error "Python minor version is not supported."
1018 | #error "Python minor version is not supported."
...
Building wheel for numba-cuda (pyproject.toml): finished with status 'error'
ERROR: Failed building wheel for numba-cuda
Failed to build numba-cuda
error: failed-wheel-build-for-install
numba-cuda 0.26.0 was the first version with Python 3.14 wheels... something must be holding the solver back from using that.
There was a problem hiding this comment.
ahhhh there it is. torch is == pinning to cuda-bindings and cuda-pathfinder
...
Collecting cuda-bindings==13.0.3 (from torch==2.10.0+cu130)
Collecting cuda-pathfinder~=1.1 (from cuda-bindings==13.0.3->torch==2.10.0+cu130)
...
Later numba-cuda[cu13] needs at least cuda-pathfinder>=1.3.1.
$ docker run --rm -it python:3.14 bash
$ pip install pkginfo
$ pip download --no-deps 'numba-cuda==0.26.0'
$ pkginfo --json ./numba_cuda*.whl
...
"requires_dist": [
"numba>=0.60.0",
"cuda-bindings<14.0.0,>=12.9.1",
"cuda-core<1.0.0,>=0.5.1",
"packaging",
"cuda-bindings<13.0.0,>=12.9.1; extra == \"cu12\"",
"cuda-pathfinder<2.0.0,>=1.3.1; extra == \"cu12\"",
"cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.*; extra == \"cu12\"",
"cuda-bindings==13.*; extra == \"cu13\"",
"cuda-pathfinder<2.0.0,>=1.3.1; extra == \"cu13\"",
"cuda-toolkit[cccl,cudart,nvjitlink,nvrtc,nvvm]==13.*; extra == \"cu13\""
],
...numba-cuda[cu13]==24.0 doesn't constraint cuda-pathfinder
$ pip download --no-deps 'numba-cuda==0.24.0'
$ pkginfo --json ./numba_cuda-0.24.0*.tar.gz
...
"requires_dist": [
"numba>=0.60.0",
"cuda-bindings<14.0.0,>=12.9.1",
"cuda-core<1.0.0,>=0.3.2",
"packaging",
"cuda-bindings<13.0.0,>=12.9.1; extra == \"cu12\"",
"cuda-core<1.0.0,>=0.3.0; extra == \"cu12\"",
"cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.*; extra == \"cu12\"",
"cuda-bindings==13.*; extra == \"cu13\"",
"cuda-core<1.0.0,>=0.3.2; extra == \"cu13\"",
"cuda-toolkit[cccl,cudart,nvjitlink,nvrtc,nvvm]==13.*; extra == \"cu13\""
],
...Looks like that was added here: NVIDIA/numba-cuda#308
It looks like this wasn't caught on earlier PRs because CI fell back to a CPU-only torch 😬
Collecting torch>=2.10.0 (from -r test-pytorch-requirements.txt (line 4))
Obtaining dependency information for torch>=2.10.0 from http://pip-cache.local.gha-runners.nvidia.com/packages/69/2b/51e663ff190c9d16d4a8271203b71bc73a16aa7619b9f271a69b9d4a936b/torch-2.10.0-cp314-cp314-manylinux_2_28_aarch64.whl.metadata
Downloading http://pip-cache.local.gha-runners.nvidia.com/packages/69/2b/51e663ff190c9d16d4a8271203b71bc73a16aa7619b9f271a69b9d4a936b/torch-2.10.0-cp314-cp314-manylinux_2_28_aarch64.whl.metadata (31 kB)
There was a problem hiding this comment.
Alright, it goes deeper than this.
Just latest numba-cuda and latest torch are happily installable together on Python 3.14.
pip download \
--no-deps \
--index-url https://download.pytorch.org/whl/cu130 \
'torch==2.10.0+cu130'
pip install \
--prefer-binary \
./torch-*.whl \
'numba-cuda[cu13]>=0.22.1'
# Successfully installed ... cuda-bindings-13.0.3 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-toolkit-13.0.2 ... numba-0.64.0 numba-cuda-0.28.2 ... nvidia-cublas-13.1.0.3 nvidia-cuda-cccl-13.0.85 ... torch-2.10.0+cu130I think the problem looks like this:
torch-2.10+cu130depends on a bunch ofnvidia-{thing}=={version-from-CTK-13.0.2}wheels- newer
numba-cudadepends oncuda-toolkit[cccl,cudart,nvrtc,nvvm]==13.*(since Set up a new VM-based CI infrastructure NVIDIA/numba-cuda#604) - in this CI job, we're constraining to
cuda-toolkit==13.1.* - the solver backtracks to
numba-cuda==24.0(which didn't havecuda-toolkitpinnings, and whosenvidia-nccland similar dependencies are compatible withtorch-2.10's) numba-cuda==24.0didn't have wheels for Python 3.14, sopiptries to build it from source- that build from source fails with the error above that basically means "this doesn't support Python 3.14"
What we really want here is a big loud solver error that says "torch-2.10+cu130 only works with the packages pinned in cuda-toolkit==13.0.2, not installable here".
There was a problem hiding this comment.
Alright on the latest build, here's what happened.
Looking at the most recent CI run (build link)
All of these environments look like I'd expect them too and show we're covering a wide range of CTK versions.
NOTE: we end up using CTK 12.4 in the arm64 jobs with RAPIDS_CUDA_VERSION=12.2.2, because there weren't aarch64 cuBLAS wheels for earlier CTKs. CUDA 12.2 will have to be tested in nightlies (I'll do that next on this PR and add a follow-up comment).
Regular wheel tests
wheel-tests / 12.2.2, 3.11, arm64, ubuntu22.04, a100, latest-driver, latest-deps
details (click me)
(link)
Looks exactly like what we want... cuda-toolkit 12.4 (allowed on arm), nvJitLink 12.9, latest numba-cuda
Successfully installed ... cuda-bindings-12.9.5 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-12.9.5 cuda-toolkit-12.4.0 ... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-cu12-12.4.99 nvidia-cuda-nvcc-cu12-12.4.99 nvidia-cuda-nvrtc-cu12-12.4.99 nvidia-cuda-runtime-cu12-12.4.99 nvidia-nvjitlink-cu12-12.9.86 ... rmm-cu12-26.4.0a55
wheel-tests / 12.9.1, 3.11, amd64, ubuntu22.04, l4, latest-driver, oldest-deps
details (click me)
(link)
No cuda-toolkit, oldest numba-cuda (oldest-deps!), nvjitlink 12.9
Successfully installed ... cuda-bindings-12.9.5 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-12.9.5 ... numba-cuda-0.22.1 numpy-1.23.5 ... nvidia-nvjitlink-cu12-12.9.86 ...rmm-cu12-26.4.0a55
wheel-tests / 12.9.1, 3.14, amd64, ubuntu24.04, h100, latest-driver, latest-deps
details (click me)
(link)
Looks good, everything from CTK 12.9 and latest numba-cuda
Successfully installed ... cuda-bindings-12.9.5 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-12.9.5 cuda-toolkit-12.9.1... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-cu12-12.9.27 nvidia-cuda-nvcc-cu12-12.9.86 nvidia-cuda-nvrtc-cu12-12.9.86 nvidia-cuda-runtime-cu12-12.9.79 nvidia-nvjitlink-cu12-12.9.86 ... rmm-cu12-26.4.0a55
wheel-tests / 13.0.2, 3.12, amd64, ubuntu24.04, l4, latest-driver, latest-deps
details (click me)
(link)
Looks good, latest numba-cuda, cuda-toolkit 13.0, most CTK libraries from 13.0, nvJitLink from 13.1.
Successfully installed ... cuda-bindings-13.1.1 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-13.1.1 cuda-toolkit-13.0.2 ... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-13.0.85 nvidia-cuda-nvrtc-13.0.88 nvidia-cuda-runtime-13.0.96 ... nvidia-nvjitlink-13.1.115 ... rmm-cu13-26.4.0a55
wheel-tests / 13.0.2, 3.12, arm64, rockylinux8, l4, latest-driver, latest-deps
details (click me)
(link)
Looks good, everything from CTK 13.0 and latest numba-cuda
Successfully installed ... cuda-bindings-13.1.1 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-13.1.1 cuda-toolkit-13.0.2 ... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-13.0.85 nvidia-cuda-nvrtc-13.0.88 nvidia-cuda-runtime-13.0.96 nvidia-nvjitlink-13.1.115 nvidia-nvvm-13.0.88 ... rmm-cu13-26.4.0a55
wheel-tests / 13.1.1, 3.13, amd64, rockylinux8, rtxpro6000, latest-driver, latest-deps
details (click me)
(link)
Looks good, everything from CTK 13.1 and latest numba-cuda
Successfully installed ... cuda-bindings-13.1.1 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-13.1.1 cuda-toolkit-13.1.1... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-13.1.115 nvidia-cuda-nvrtc-13.1.115 nvidia-cuda-runtime-13.1.80 nvidia-nvjitlink-13.1.115 nvidia-nvvm-13.1.115 ... rmm-cu13-26.4.0a55
wheel-tests / 13.1.1, 3.14, amd64, ubuntu24.04, rtxpro6000, latest-driver, latest-deps
details (click me)
(link)
Looks good, everything from CTK 13.1 and latest numba-cuda
Successfully installed ... cuda-bindings-13.1.1 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-13.1.1 cuda-toolkit-13.1.1 ... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-13.1.115 nvidia-cuda-nvrtc-13.1.115 nvidia-cuda-runtime-13.1.80 nvidia-nvjitlink-13.1.115 nvidia-nvvm-13.1.115 ... rmm-cu13-26.4.0a55
wheel-tests / 13.1.1, 3.14, arm64, ubuntu24.04, l4, latest-driver, latest-deps
details (click me)
(link)
Looks good, everything from CTK 13.1 and latest numba-cuda
Successfully installed ... cuda-bindings-13.1.1 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-13.1.1 cuda-toolkit-13.1.1... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-13.1.115 nvidia-cuda-nvrtc-13.1.115 nvidia-cuda-runtime-13.1.80 nvidia-nvjitlink-13.1.115 nvidia-nvvm-13.1.115 ... rmm-cu13-26.4.0a55
PyTorch / CuPy tests
wheel-tests-integration-optional / 12.2.2, 3.11, arm64, ubuntu22.04, a100, latest-driver, latest-deps
details (click me)
(link)
As expected, PyTorch skipped because this project doesn't test PyTorch versions old enough to run against CTK 12.2
Skipping PyTorch tests (requires CUDA 12.6-12.9 or 13.0, found 12.2.2)
CuPy tests pulled in latest numba-cuda, cuda-toolkit 12.4 (intentionally allowed on arm64, it's fine), and the latest 12.x nvjitlink (12.9). This looks like what we want!
Successfully installed ... cuda-bindings-12.9.5 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-12.9.5 cuda-toolkit-12.4.0 cupy-cuda12x-14.0.1 ... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-cu12-12.4.99 nvidia-cuda-nvcc-cu12-12.4.99 nvidia-cuda-nvrtc-cu12-12.4.99 nvidia-cuda-runtime-cu12-12.4.99 nvidia-nvjitlink-cu12-12.9.86 ... rmm-cu12-26.4.0a55
wheel-tests-integration-optional / 12.9.1, 3.11, amd64, ubuntu22.04, l4, latest-driver, oldest-deps
details (click me)
(link)
For PyTorch tests, no cuda-toolkit installed in the environment, fell all the way back to numba-cuda=0.22.1 (makes sense, oldest-deps!) , used nvidia-* packages from CTK 12.9.
Successfully installed ... cuda-bindings-12.9.5 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-12.9.5 ... numba-cuda-0.22.1 numpy-1.23.5 nvidia-cublas-cu12-12.9.1.4 ... nvidia-nvjitlink-cu12-12.9.86 ... rmm-cu12-26.4.0a55 ... torch-2.9.0+cu129 ...
CuPy tests downgraded CuPy to 13.6.0 (makes sense, oldest-deps!) and that brought fastrlock down with it.
Successfully installed cupy-cuda12x-13.6.0 fastrlock-0.8.3
wheel-tests-integration-optional / 12.9.1, 3.14, amd64, ubuntu24.04, h100, latest-driver, latest-deps
details (click me)
For PyTorch tests, cuda-toolkit 12.9 gets installed. It's the correct version (12.9) and we see the expected versions of CTK libraries, like cuBLAS 12.9 and nvJitLink 12.9.
Successfully installed ... cuda-bindings-12.9.4 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-12.9.4 cuda-toolkit-12.9.1 ... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cublas-cu12-12.9.1.4 ... nvidia-nvjitlink-cu12-12.9.86 ... rmm-cu12-26.4.0a55 ... torch-2.10.0+cu129 ...
CuPy tests kept everything in that environment and just added CuPy
Successfully installed cupy-cuda12x-14.0.1
wheel-tests-integration-optional / 13.0.2, 3.12, amd64, ubuntu24.04, l4, latest-driver, latest-deps
details (click me)
(link)
For PyTorch tests, no cuda-toolkit installed in the environment, but latest numba-cuda (0.28.2) and used CTK 13.0 packages (e.g. cuBLAS 13.1, nvJitLink 13.08). See https://docs.nvidia.com/cuda/archive/13.0.2/cuda-toolkit-release-notes/index.html toconfirm those versions.
Successfully installed ... cuda-bindings-13.0.3 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-13.0.3 cuda-toolkit-13.0.2 ... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cublas-13.1.0.3 ... nvidia-nvjitlink-13.0.88 ... rmm-cu13-26.4.0a55 ... torch-2.10.0+cu130 ...
CuPy tests kept everything in that environment and just added CuPy
Successfully installed cupy-cuda13x-14.0.1
wheel-tests-integration-optional / 13.0.2, 3.12, arm64, rockylinux8, l4, latest-driver, latest-deps
details (click me)
(link)
PyTorch tests pulled in cuda-toolkit==13.0.2 and CTK 13.0 libraries (including nvJitLink 13.0)
Successfully installed ... cuda-bindings-13.0.3 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-13.0.3 cuda-toolkit-13.0.2 ... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cublas-13.1.0.3 ... nvidia-nvjitlink-13.0.88 ... rmm-cu13-26.4.0a55 ... torch-2.10.0+cu130 ...
CuPy tests kept everything in that environment and just added CuPy
Successfully installed cupy-cuda13x-14.0.1
wheel-tests-integration-optional / 13.1.1, 3.13, amd64, rockylinux8, rtxpro6000, latest-driver, latest-deps
details (click me)
(link)
As expected, skipped because there aren't PyTorch wheels support CUDA 13.1 yet.
Skipping PyTorch tests (requires CUDA 12.6-12.9 or 13.0, found 13.1.1)
CuPy tests pulled in cuda-toolkit 13.1, latest numba-cuda (0.28.2), and corresponding nvidia-* libraries.
Successfully installed ... cuda-bindings-13.1.1 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-13.1.1 cuda-toolkit-13.1.1 cupy-cuda13x-14.0.1 ... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-13.1.115 nvidia-cuda-nvrtc-13.1.115 nvidia-cuda-runtime-13.1.80 nvidia-nvjitlink-13.1.115 ... rmm-cu13-26.4.0a55
wheel-tests-integration-optional / 13.1.1, 3.14, amd64, ubuntu24.04, rtxpro6000, latest-driver, latest-deps
details (click me)
(link)
As expected, skipped because there aren't PyTorch wheels support CUDA 13.1 yet.
Skipping PyTorch tests (requires CUDA 12.6-12.9 or 13.0, found 13.1.1)
CuPy tests pulled in cuda-toolkit 13.1, latest numba-cuda (0.28.2), and corresponding nvidia-* libraries.
Successfully installed ... cuda-bindings-13.1.1 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-13.1.1 cuda-toolkit-13.1.1 cupy-cuda13x-14.0.1 ... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-13.1.115 nvidia-cuda-nvrtc-13.1.115 nvidia-cuda-runtime-13.1.80 nvidia-nvjitlink-13.1.115 ... rmm-cu13-26.4.0a55
wheel-tests-integration-optional / 13.1.1, 3.14, arm64, ubuntu24.04, l4, latest-driver, latest-deps
details (click me)
(link)
As expected, skipped because there aren't PyTorch wheels support CUDA 13.1 yet.
Skipping PyTorch tests (requires CUDA 12.6-12.9 or 13.0, found 13.1.1)
CuPy tests pulled in cuda-toolkit 13.1, latest numba-cuda (0.28.2), and corresponding nvidia-* libraries.
Successfully installed ... cuda-bindings-13.1.1 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-13.1.1 cuda-toolkit-13.1.1 cupy-cuda13x-14.0.1... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-13.1.115 nvidia-cuda-nvrtc-13.1.115 nvidia-cuda-runtime-13.1.80 nvidia-nvjitlink-13.1.115 ... rmm-cu13-26.4.0a55
There was a problem hiding this comment.
(updates from testing with nightly matrix)
wheel-tests / 12.2.2, 3.11, amd64, ubuntu22.04, v100, earliest-driver, latest-deps
(link)
Looks exactly like what we want... cuda-toolkit 12.2 (allowed on arm), 12.2 versions of most CTK libraries, nvJitLink 12.9, latest numba-cuda.
Successfully installed ... cuda-bindings-12.9.5 cuda-core-0.6.0 cuda-pathfinder-1.4.0 cuda-python-12.9.5 cuda-toolkit-12.2.2 ... numba-cuda-0.28.2 numpy-2.4.2 nvidia-cuda-cccl-cu12-12.2.140 nvidia-cuda-nvcc-cu12-12.2.140 nvidia-cuda-nvrtc-cu12-12.2.140 nvidia-cuda-runtime-cu12-12.2.140 nvidia-nvjitlink-cu12-12.9.86 ... rmm-cu12-26.4.0a57
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
ci/test_wheel_integrations.sh (1)
7-13:⚠️ Potential issue | 🟠 MajorPin
gha-toolsto an immutable ref and avoid fixed/tmpclone path before merge.This still uses a mutable branch and a shared absolute path, which makes runs less reproducible and can fail if the path already exists.
Suggested hardening
# TODO(jameslamb): revert before merging -git clone --branch generate-pip-constraints \ - https://github.com/rapidsai/gha-tools.git \ - /tmp/gha-tools - -export PATH="/tmp/gha-tools/tools:${PATH}" +GHA_TOOLS_DIR="$(mktemp -d)" +GHA_TOOLS_REF="${GHA_TOOLS_REF:-<pin-a-tag-or-commit-sha>}" +git clone --depth 1 https://github.com/rapidsai/gha-tools.git "${GHA_TOOLS_DIR}" +git -C "${GHA_TOOLS_DIR}" checkout "${GHA_TOOLS_REF}" + +export PATH="${GHA_TOOLS_DIR}/tools:${PATH}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ci/test_wheel_integrations.sh` around lines 7 - 13, Replace the mutable branch clone and fixed /tmp path: change the git clone call that currently uses "--branch generate-pip-constraints" and destination "/tmp/gha-tools" to clone a specific immutable commit SHA (pin to a known commit) and clone into a unique directory (e.g., a mktemp/mkdir under the job workspace or a generated tempdir) instead of "/tmp/gha-tools"; then update the export PATH line that references "/tmp/gha-tools/tools" to point at the new tempdir's tools subdirectory and ensure the script cleans up the tempdir when finished.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ci/test_wheel_integrations.sh`:
- Line 73: Update the skip log invoked via rapids-logger so the message matches
the actual gate (CUDA must be >=12.6 and <13.1 or 13.0); replace the current
text "Skipping PyTorch tests (requires CUDA <13.1, found
${RAPIDS_CUDA_VERSION})" with a message that states the full allowed range, e.g.
"Skipping PyTorch tests (requires CUDA >=12.6 and <13.1, found
${RAPIDS_CUDA_VERSION})", keeping the rapids-logger call and the
${RAPIDS_CUDA_VERSION} interpolation intact.
---
Duplicate comments:
In `@ci/test_wheel_integrations.sh`:
- Around line 7-13: Replace the mutable branch clone and fixed /tmp path: change
the git clone call that currently uses "--branch generate-pip-constraints" and
destination "/tmp/gha-tools" to clone a specific immutable commit SHA (pin to a
known commit) and clone into a unique directory (e.g., a mktemp/mkdir under the
job workspace or a generated tempdir) instead of "/tmp/gha-tools"; then update
the export PATH line that references "/tmp/gha-tools/tools" to point at the new
tempdir's tools subdirectory and ensure the script cleans up the tempdir when
finished.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4e302f1e-2607-40a8-85db-ed34941bca20
📒 Files selected for processing (2)
ci/test_wheel_integrations.shdependencies.yaml
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
ci/test_wheel_integrations.sh (1)
7-13:⚠️ Potential issue | 🟠 MajorPin
gha-toolsto an immutable ref and stop cloning into a fixed/tmppath.Line 8 currently pulls a mutable branch into
/tmp/gha-tools, which keeps CI non-reproducible and can fail on path reuse. Please switch to a unique temp dir and a pinned tag/SHA before merge.Suggested hardening
-# TODO(jameslamb): revert before merging -git clone --branch generate-pip-constraints \ - https://github.com/rapidsai/gha-tools.git \ - /tmp/gha-tools - -export PATH="/tmp/gha-tools/tools:${PATH}" +# TODO(jameslamb): revert before merging +GHA_TOOLS_DIR="$(mktemp -d)" +GHA_TOOLS_REF="${GHA_TOOLS_REF:-<pin-a-tag-or-commit-sha>}" +git clone --depth 1 https://github.com/rapidsai/gha-tools.git "${GHA_TOOLS_DIR}" +git -C "${GHA_TOOLS_DIR}" checkout "${GHA_TOOLS_REF}" + +export PATH="${GHA_TOOLS_DIR}/tools:${PATH}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ci/test_wheel_integrations.sh` around lines 7 - 13, Replace the mutable branch clone into the fixed /tmp/gha-tools path by cloning a pinned immutable ref into a unique temporary directory: create a temp dir (e.g., via mktemp -d), git clone using a specific tag or commit SHA instead of the branch name (replace "generate-pip-constraints" with the chosen tag/SHA), point PATH at the temp dir's tools subdir (export PATH="$TEMP_DIR/tools:${PATH}"), and ensure the temp dir is cleaned up after the script finishes; update the git clone and export PATH lines accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@ci/test_wheel_integrations.sh`:
- Around line 7-13: Replace the mutable branch clone into the fixed
/tmp/gha-tools path by cloning a pinned immutable ref into a unique temporary
directory: create a temp dir (e.g., via mktemp -d), git clone using a specific
tag or commit SHA instead of the branch name (replace "generate-pip-constraints"
with the chosen tag/SHA), point PATH at the temp dir's tools subdir (export
PATH="$TEMP_DIR/tools:${PATH}"), and ensure the temp dir is cleaned up after the
script finishes; update the git clone and export PATH lines accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 12ddf0fc-8137-45a2-9f85-631dc3000aef
📒 Files selected for processing (1)
ci/test_wheel_integrations.sh
|
/ok to test |
Contributes to rapidsai/build-planning#256 `rapids-generate-pip-constraints` currently special-cases `RAPIDS_DEPENDENCIES="latest"` and skips generating constraints in that case. This will be helpful in rapidsai/build-planning#256, where we want to start constraining `cuda-toolkit` in wheels CI based on the CTK version in the CI image being used. ## Notes for Reviewers ### How I tested this Looked for projects using this ([GitHub search](https://github.com/search?q=org%3Arapidsai+language%3AShell+%22rapids-generate-pip-constraints%22+AND+NOT+is%3Aarchived+&type=code)) and tested in them. It's just a few: * [ ] cudf (rapidsai/cudf#21639) * [ ] cuml (rapidsai/cuml#7853) * [ ] dask-cuda (rapidsai/dask-cuda#1632) * [ ] nvforest (rapidsai/nvforest#62) * [ ] raft (rapidsai/raft#2971) * [ ] rmm (rapidsai/rmm#2270) On all of those, wheels CI jobs worked exactly as expected and without needing any code changes or `dependencies.yaml` updates... so this PR is safe to merge any time. ### Is this safe? It should be (see "How I tested this"). This is only used to add **constraints** (not requirements), so it shouldn't change our ability to catch problems like "forgot to declare a dependency" in CI. It WILL increase the risk of `[test]` extras being underspecified. For example, if `cuml[test]` has `scikit-learn>=1.3` and the constraints have `scikit-learn>=1.5`, we might never end up testing `scikit-learn>=1.3,<1.5` (unless it's explicitly accounted for in a `dependencies: "oldest"` block). The other risk here is that this creates friction because constraints passed to `--constraint` cannot contain extras. So e.g. if you want to depend on `xgboost[dask]`, that cannot be in any of the lists generated by `rapids-generate-pipe-constraints`. I think we can work around that though when we hit those cases. Overall, I think these are acceptable tradeoffs. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #247
Description
Contributes to rapidsai/build-planning#257
Contributes to rapidsai/build-planning#256
Other changes
torchCUDA wheel installation stricter (using patterns borrowed from wheels CI: stricter torch index selection, test oldest versions of dependencies cugraph-gnn#413)Checklist